Light Higgs boson discovery from fermion mixing 



J. A. Aguilar-Saavedra 

Departamento de Fisica Tedrica y del Cosmos and CAFPE, 
Universidad de Granada, E-18071 Granada, Spain 

Abstract 

We evaluate the LHC discovery potential for a light Higgs boson in tiH (— > Ivbbbbjj) 
production, within the Standard Model and if a new Q = 2/3 quark singlet T with 
a moderate mass exists. In the latter case, T pair production with decays TT — > 
W + b Hi/ Ht W~b — > W + bW~bH provides an important additional source of Higgs bosons 
giving the same experimental signature, and other decay modes TT — ► Ht Hi — > W + b W~b 
HH, TT -> ZtHi/HtZi -> W+ftW-fciTZ further enhance this signal. Both analyses 
I> | are carried out with particle-level simulations of signals and backgrounds, including it 

plus ri = 0, . . . , 5 jets which constitute the main background by far. Our estimate for SM 
Higgs discovery in tiH production, 0.4(7 significance for Mh = 115 GeV and an integrated 
luminosity of 30 fb" 1 , is similar to the most recent ones by CMS which also include the 
full tinj background. We show that, if a quark singlet with a mass mr = 500 GeV exists, 
the luminosity required for Higgs discovery in this final state is reduced by more than 
two orders of magnitude, and 5a significance can be achieved already with 8 fb _1 . This 
new Higgs signal will not be seen unless we look for it: with this aim, a new specific final 
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state reconstruction method is presented. Finally, we consider the sensitivity to search 



for Q — 2/3 singlets. The combination of these three decay modes allows to discover a 



500 GeV quark with 7 fb 1 of luminosity. 

1 Introduction 

The discovery of the Higgs boson is one of the main goals of the Large Hadron Collider (LHC). 
Our present understanding of electroweak symmetry breaking in the Standard Model (SM) 
relies on the existence of at least one of such scalar particles [1], whose mass is however not 
predicted. Direct searches at LEP have placed the limit Mh > 114 .4 GeV on the mass of a 
SM-like Higgs, with a 95% confidence level (CL) [2]. Actually, data taken from the ALEPH 
collaboration showed an excess of events over the SM background consistent with a 115 GeV 
Higgs boson, but these results were not confirmed by the other LEP collaborations. There is 
some theoretical prejudice leading us to believe in the existence of a Higgs boson not much 
heavier than this direct bound. Precision electroweak data seem to indicate its existence, 
with a best-fit value of M H = 91±H GeV for its mass [3] if the SM is assumed. On the other 
hand, the Higgs boson must be lighter than around 1 TeV if the SM is required to remain 
perturbative up to the unification scale [4]. 

There is a vast Higgs search program at LHC, including various production processes and 
the decay channels relevant in each mass range [5,6]. Most analyses focus on the search of 
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a SM-like Higgs boson. For masses Mh % 130 GeV the decay H — ► bb dominates, with a 
branching fraction around 0.7. However, the most important production process gg — > H 
is not visible in this channel due to the enormous QCD background. One has then to fall 
back either on rare decay modes, production processes in association with extra particles, or 
both. One example is the production together with a ti pair, with H — > bb and semileptonic 
decay of t, t. Further examples are gg — > H followed by H — ► 77 (which has a branching 
ratio around 0.2%), or associate production tiH ', WH, ZH with H — > 77. Simulations 
performed by the ATLAS collaboration [7, 8] estimated that tiH with H — > bb allows to reach 
5(7 significance for a 120 GeV Higgs boson with an integrated luminosity of 100 fb _1 , while 
very recent results from CMS [9], with a more realistic background calculation, considerably 
lower these expectations: even in the ideal case of no systematic uncertainties, 5a significance 
could only be possible with ~ 180 fb _1 (combining several decay channels of the ti pair). 
Hence, discovery of tiH, with H — > bb, seems unfeasible. However, the combination of H , 
ttH, WH and ZH production, with H — > 77, is expected to give 5a already with 60 fb _1 , 
providing also a relatively precise measurement of the Higgs mass. Vector boson fusion (VBF) 
processes qq — > q'q'H, with H — > W + W~ — > l + ui'^u, provide a similar sensitivity [10]. 

For larger Higgs masses the prospects are better. For 130 < Mjj < 2Myy, 99 ^ H 
production with decay H — ► ZZ* — ► £ + l~i' + l'~ provides a very clean experimental signature 
of four charged leptons. A Higgs particle with Mh = 130 GeV may be detected in this channel 
with 15 fb -1 , and for Mh = 150 GeV the luminosity required is reduced to 3 fb _1 . VBF 
processes are also interesting in this mass range, allowing to discover the Higgs with 12.9 
ft)" 1 for M n = 130 GeV and 3.5 fb _1 for M H = 150 GeV [10]. For slightly larger masses, 
2M\y < Mh < 2Mz, the H — > ZZ mode gets very suppressed due to the appearance of 
the on-shell decay H — > W + W~ . Two signals are interesting in this range: gg — > If, with 
W + W~ — ► £ + vi'~v, giving 5cr significance for a luminosity around 4 fb -1 [11], and again 
VBF processes, with leptonic or semileptonic decays of the W pair, which improve this result 
giving the same sensitivity for 2 fb _1 . For masses larger than 2Mz, the mode H — > is 
possible with both Z bosons on their mass shell. This channel alone can signal the existence 
of a Higgs boson with a luminosity ranging from 2.6 fb _1 for Mh = 200 GeV to 32 fb _1 for 
Mh = 600 GeV [12]. Larger masses up to approximately 1 TeV can be probed combining 
different channels. 

In SM extensions these production mechanisms can be enhanced or suppressed, and new 
ones may appear. In this work we analyse in detail a new production mechanism [14], possible 
when the top quark mixes with a new Q = 2/3 singlet. Such particles appear in Little Higgs 
models [15], extra-dimensions [16], and grand unified theories [17]. They can be produced in 
pairs at LHC, through standard QCD interactions, with a large cross section for moderate 
masses of few hundreds of GeV. Their decays are determined by their mixing with SM quarks, 
which (by theoretical considerations and experimental constraints) is expected to be largest 
with the third generation. In particular, their decays to Ht occur with a branching ratio 
close to 25% for Mh <C rnr- This possibility would be especially welcome, since it increases 
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the observability of a Higgs boson in the mass region Mjj < 130 GeV where its detection is 
more difficult. For definiteness, we will assume Mh = 115 GeV, though the results are rather 
insensitive to the Higgs mass, as long as the main decay channel is H — ► bb. The largest cross 
section corresponds to 

gg, qq^Tf^ W + bHi/Ht W'b -> W + bW~bH , (1) 

with semileptonic decay of the W pair and H — > bb. It gives the same experimental signature 
Ivbbbbjj as SM tiH production but the kinematics is rather different. Two further processes 
contribute to the Higgs signal, 

gg, qq^Tf -> Ht Hi -> W + b W~bHH , 

gg,qq^TT -> Zt Hi/ Ht Zi ^ W + bWb HZ , (2) 

yielding the same final state, or the same state plus two jets, when the extra Higgs and Z 
boson decay H — ► bb, cc, Z — > qq, vv. In this work we compare the discovery potential in 
this final state within the SM (in which case the only signal is tiH) and with a new singlet 
T, assuming for its mass a reference value nix = 500 GeV. It has been shown that such a 
particle could be seen at LHC in a short time, through its decays TT — > W + bW~b [18] 
Experimental search in the Ivbbbbjj final state would improve the statistical significance of 
the T signal and, what is perhaps even more important, it would allow a prompt discovery of 
the Higgs boson. 

We remark that, in contrast to what happens with a fourth sequential generation [20], a 
quark singlet contributes very little to gg — > H in general, due to its tiny Yukawa coupling 
obtained by mixing with the top quark. The amplitudes for gg — > H mediated by T and top 
quarks, relative to the SM one (involving the top quark only), can be written as 
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being Xtt, X u mixing factors (see next section for details) and I a loop function. The ratio 
in brackets is very close to unity for a light Higgs, and takes the value 0.977 for Mjj = 115 
GeV, rriT = 500 GeV. With typical values Xtt — 0.04, X u — 0.96 for the mixing factors, 
for tut = 500 GeV the T amplitude is about 9 times smaller than the SM one, and the top 
quark contribution is reduced by a factor 0.96. We also note that in particular SM extensions 
including Q = 2/3 singlets other processes and/or channels may be enhanced or suppressed. 
An interesting example takes place in Little Higgs models, where the gg — > H cross section 
may be suppressed but the branching ratio for H — > 77 can increase in some regions of 
parameter space, due to the extra contribution of the new fermions to the effective i/77 



1 Q — 2/3 singlets with masses up to 1.1 TeV can be discovered at LHC in this channel, for three years at 
the high luminosity run (100 fb -1 per year). For Q — —1/3 singlets the discovery reach is very similar [19]. 
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vertex [13]. We finally note that in models with one (or more) Q = —1/3 singlet B there are 
large Higgs signals from BB production and decay B — > Hb, giving different final states from 
the ones studied here [14]. 



2 Summary of the model 

SM extensions with vector-like quarks under SU(2)x have been introduced before, and their 
phenomenology has been extensively explored [21-24]. Here we will briefly recall the main 
features of a SM extension with a Q = 2/3 quark singlet, summarising the most relevant 
points for this work. The addition of two SU(2)i singlet fields T° R to the quark spectrum 
modifies the weak and scalar interactions involving Q = 2/3 quarks, but does not affect 
strong and electromagnetic interactions. (We denote weak eigenstates with a zero superscript, 
to distinguish them from mass eigenstates which do not bear superscripts.) Thus, the new 
Q = 2/3 mass eigenstate T can be produced in pairs in pp collisions via QCD interactions 
like the top quark. The production cross section, plotted in Fig. [IJ decreases with my but is 
sizeable for T masses of several hundreds of GeV. For our evaluations we will take tut = 500 
GeV, well above the present limit from Tevatron tut > 258 GeV at 95% CL [25] o 
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Figure 1: Total production cross section for gg, qq — ► TT for different T masses. 



The decay of the new quark takes place through electroweak and scalar interactions. Using 
standard notation, these interactions read 
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where u = (u, c,t,T), d = (d, s, b) and Pr,l = (1± 7s)/2. The extended Cabibbo-Kobayashi- 
Maskawa (CKM) matrix V is of dimension 4 x 3, X = VV' is a non-diagonal 4x4 matrix 



2 This limit assumes Br(T — > W + b) = 1. The new eigenstate can also decay T 
but these two channels are kinematically forbidden for uit = 258 GeV. 



Zt, T — > Ht (see below), 
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and M u is the 4x4 diagonal up-type quark mass matrix. The new mass eigenstate T is 
expected to couple mostly with third generation quarks t, b, because Tj- , preferably mix 
with t° L , t° R , respectively, due to the large top quark mass. Vrb is mainly constrained by 
the contribution of the new quark to the T parameter [24]. For mr = 500 GeV, the most 
recent value T = -0.03 ± 0.09 [26] implies \V Tb \ < 0.17 with a 95% CL. Mixing of T° with 
u° L , (P L , especially with the latter, is very constrained by parity violation experiments and 
the measurement of R c and A™ at LEP, respectively [22,27], implying small X u r, X c t. The 
charged current couplings with d, s must be small as well, \Vrd\, \ Vt s \ ~ 0.05, because otherwise 
the new quark would give large loop contributions to kaon and B physics observables [24]. 
Therefore, |Vrd|,|V^r s | <C |Vr&| and |X u t|)|-^ct| *C |-X~£r|« The couplings of the t,T quarks 
can be expressed in terms of the charged current coupling Vrb, 
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As it has been mentioned above, the TT cross section is independent of Vrb and, as we 
will see below, branching ratios are independent too. The only place where this mixing 
appears is the total T width, which is much smaller than the experimental resolution for 
the T mass. Thus, Vrb has no influence at all in our results. For defmiteness, we have 
taken for our evaluations a coupling Vrb = 0.2. This value is slightly above the most recent 
95% limit from the T parameter (and compatible with the previous one, \Vrb\ < 0.26). For 
this coupling, the Yukawa coupling of the top quark is ymt = (rn t /2Mw) X t t, reduced by 
a factor 0.96 with respect to its SM value, and the Yukawa of the new quark is very small, 
Vhtt = {mr /2Mw) Xtt with Xtt = 0.04. The relevant decays of the new quark are 
T — > W + b, Zt, Ht, with partial widths 
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with 



X(m T , m t , M) = (m T + mf + M 4 
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a kinematical function. The two couplings Vrb, XtT involved in the decays are approximately 
equal (see Eq. ©). Since the three partial widths are proportional to |Vt6| 2 , the branching 
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ratios only depend on my and Mh- They are plotted in Fig. [2] for a fixed value Mh = 115 
GeV. For m T = 500 GeV, we have Br(T -> W+6) = 0.503, Br(T Zt) = 0.166, Br(T -> 
£Tt) = 0.331. (The total T width is T T = 3.115 for V Tb = 0.2.) Decays T -» Zt -► ^-"W+ft, 
£ = e,n give a cleaner final state than T — > W + b, but with a branching ratio 10 times smaller. 
The channel T — > VF + 6 (T — ► VK - ^) gives the best discovery potential for the new quark in 
single T [28] as well as in TT production [18]. The remaining decay T — ► i7t constitutes a 
copious source of Higgs bosons for moderate T masses, for which the TT production cross 
section is large. We point out that in the minimal SM extension where only one Q = 2/3 
singlet is introduced these branching ratios are independent of the mixing, and the new quark 
(provided it is not decoupled) always decays T — > Ht if my > m t + Mff . In models with 
extra interactions, decays to W, Z' bosons may occur, if kinematically allowed. If additional 
scalars exist, mixing with the lightest one H might also be suppressed, if this Higgs is not 
SM-like. 

j I — ■ T^Wb 




T^Zt 
— T — > Ht 



500 1000 1500 2000 2500 

Figure 2: Branching ratios for T — ► W + b, T — > Zt, T — > Ht, for different T masses. 



3 Signal and background simulation 

Many SM and some new physics processes give or mimic the experimental signature studied 
of a charged lepton, at least four 6-tagged jets and two non-tagged jets, plus missing energy. 
The relevant processes are calculated with matrix-element-based Monte Carlo generators and 
fed into PYTHIA [29] to include initial and final state radiation (ISR, FSR) and pile-up, and 
perform hadronisation. The main background is constituted by ti + n jet production. It is 
calculated, with n = 0, ...,5, with ALPGEN [30], using the MLM prescription [31] to avoid 
double counting of jet radiation performed by PYTHIA. ALPGEN is also used to calculate the 
production of W and Z bosons plus six jets, or a 66 / cc pair and four jets. New Monte Carlo 
generators are developed for TT, ttH ', tibb and ttcc (through QCD and electroweak (EW) 
interactions) and Wbbbb production, plus other processes obtained replacing the top quarks 
by heavy T quarks. These generators use the full resonant tree-level matrix elements for the 
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production and decay processes, namely 
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Matrix elements are calculated with HELAS [35], partly using MadGraph [36]. All finite width 
and spin effects are thus automatically taken into account. The colour flow information 
necessary for PYTHIA is obtained following the same method as in AcerMC [37], i.e. we randomly 
select the colour flow among the possible ones on an event-by-event basis, computing the 
probabilities of such a configuration from the matrix element (taking into account the diagrams 
contributing to such configuration). Integration in phase space is done with VEGAS [38], 
modified following Ref. [37]. These generators (except Wbbbb) have been checked against 
ALPGEN using the same parameters, structure functions and factorisation scales, obtaining 
very good agreement. For our evaluations we take m t = 175 GeV, mj, = 4.8 GeV, m c = 1.5 
GeV (neglected in W decays), a{M z ) = 1/128.878, s 2 w {M z ) = 0.23113, a s (M z ) = 0.127 
and run the coupling constants up to the the scale of the heavy (t or T) quark. Structure 
functions CTEQ5L [39] are used, with Q 2 = s the square of the partonic centre of mass 
energy. (For ALPGEN processes we select Q 2 = M 2 WZ +Vt twz -) Several representative total 
cross sections obtained (without decay branching ratios nor phase space cuts) can be found 
in Tabled], for comparison with other generators. The total cross sections for ttcc, ttnj with 
n > 1 and W/Z+ jets are numerically unstable due to collinear singularities and not shown. 
This is not a problem for event generation, since suitable kinematical cuts at the generator 
level (discussed below) can be applied to stabilise the cross sections. 



Process 



ftot 



tt (ALPGEN) 
Tf 



489 pb 
2.14 pb 
508 fb 



ttH 



ttbb 



8.65 pb 
773 fb 



ttbb EW 



Wbbbb 



303.4 fb 



Table 1: Total cross sections for several processes studied. 
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In our analysis we consider semileptonic decays of the W + W~ pairs, and leptonic decays 
in the production of W/Z + jets. The main contributions come from I = e, fj,, but decays to 
r leptons are included as well. Phase space cuts are applied at the generator level in some 
processes to reduce statistical fluctuations and improve the unweighting efficiency. The cuts 
applied are 

ttnj \r/ j \ < 2.5 , p> t > 20 GeV , AR jj > 0.4 
ttbb, ticc, Wbbbb \7] b ' c \ < 2.5 , p b t ' c > 15 GeV 
Wbbjjjj, Wccjjjj, Wjjjjjj \ V e ' b ' j \ < 2.5 , p\ > 6 GeV , p b ' j > 15 GeV , 

AR jj ' bb > 0.4 , AR ej ' ib > 0.4 
Zbbjjjj, Zccjjjj, Wjjjjjj \r) b ' j \ < 2.5 , pf max > 6 GeV , p b t ' 3 > 15 GeV , 

AR"' bb > 0.4 , (9) 

where r\ is the pseudorapidity, pt the transverse momentum and AR = \J (Ar/) 2 + (Acfr) 2 the 
lego-plot distance. The cross sections after decay, including generator cuts, can be read in 
Table [2] for t = e, fi. The TT processes in Eqs. J8)) will from now on be denoted according 
to the decay mode as TT(WH), TT(HH), TT(ZH), TT(WZ) and TT(ZZ). Sum over 
charge conjugate decays is always understood. 

Process a £ Process a e 

TT (WH) 173.6 fb 6.3% ttbb 564.9 fb 4.7% 

Tf{HH) 44.38 fb 19.3% ticc 630.5 fb 0.65% 

TT(ZH) 50.0 fb 8.5% ttbb EW 60.31 fb 4.8% 

TT(WZ) 29.03 fb 4.5% ticc EW 17.12 fb 0.72% 

Tf(ZZ) 14.07 fb 2.8% Wjjjjjj 69.85 pb ~ 7.4 x 10~ 6 

Tfbb 1.054 fb 5.3% Wbbjjjj 2.825 pb 0.12% 

tiH 118.7 fb 4.7% Wccjjjj 3.279 pb -0.015% 

tt 143.2 pb 0.034% Wbbbb 2.587 fb - 3.4 % 

ttj 142.7 pb 0.055% Zjjjjjj 10.48 pb ~ 3.9 x 10~ 6 

ttlj 95.9 pb 0.085% Zbbjjjj 722.5 fb 0.090% 

tt3j 54.0 pb 0.12% Zccjjjj 738.5 fb - 0.013% 

tUj 27 A pb 0.15% 

ti5j 12.8 pb 0.19% 

Table 2: Cross section at the generator level and efficiency e for signal and background 
processes in the decay channels with £ = e, fj,, The corresponding cross sections for final 
states with tau leptons are approximately one half, with efficiencies 20 — 30 times smaller. 



The generated events are passed through PYTHIA 6 . 403 as external processes to include 
ISR, FSR, pile-up and perform hadronisation§] We use the standard PYTHIA settings except 

3 In order to avoid double counting, in the PYTHIA simulation of the W/Z + 6 jets processes we turn off bb 
and cc pair radiation, which are independently generated. Similarly, for ticc and W/Z + bb / cc+ 4 jets we turn 
off bb pair radiation. The radiation of extra jets in tinj processes is vetoed following the MLM prescription. 
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for b fragmentation, in which we use the Peterson parameterisation with e& = —0.0035 [40]. 
For pile- up we take 4.6 events in average, corresponding to a luminosity of 2 x 10 33 cm -2 s _1 . 
Tau leptons in the final state are decayed using TAUOLA [32] and PHOTOS [33]. A fast detector 
simulation ATLFAST 2 . 60 [34], with standard settings, is used for the modelling of the ATLAS 
detector. We reconstruct jets using a cone algorithm with AR = 0.4. This cone size has 
proved to be the most adequate for top physics studies [41], providing very good agreement 
between fast and full simulations for reconstructed quantities [42]. We do not apply trigger 
inefficiencies and assume a perfect charged lepton identification. The package ATLFASTB is 
used to recalibrate jet energies and perform b tagging, for which we select a 60% efficiency at 
the low luminosity run, with nominal rejection factors of 93 for light jets and 6.7 for charm, 
and ^-dependent corrections. These efficiencies are in agreement with those obtained from 
full simulations [43], and comparable to the ones expected at CMS [44]. 

The hadronised events are required to fulfill these two criteria: (a) the presence of one 
(and only one) isolated charged lepton, which must have transverse momentum pt > 25 GeV 
(for electrons), pt > 20 GeV (for muons) and |^| < 2.5; (b) at least six jets with p t > 20 
GeV, \t]\ < 2.5, with at least four b tags and two untagged jets. The charged leptons provide 
a trigger for the events [45]. Signal and background efficiencies after these requirements are 
shown in Table [2j We notice the higher acceptance for the TT (HH) process, with six b quarks 
in the final state when both Higgs bosons decay to bb, and for TT(ZH), where sometimes 
two b quarks are produced in the Z decay. We also point out the growing efficiency of the 
ttnj processes with increasing multiplicity. 

Finally, we must note that our calculation of the Wbbbbjj background, with Wbbbb pro- 
duction at the generator level and extra jet radiation performed by PYTHIA, must be regarded 
as an estimate. The reason is that in Wbbbb only qq' scattering processes are involved, while 
gluon fusion contributes to Wbbbbjj. At any rate, this background turns out to be completely 
negligible. Zbbbb production has an even smaller cross section and we have not included it 
in our calculations. We have investigated tibbbb production with ALPGEN, which might be 
important if five or more b tags are required. The cross section (with the same cuts used 
before) is of 0.54 fb. Assuming a similar detection efficiency as for tibb, the requirement of 
five tagged jets reduces the cross section to one event for 30 fb -1 (and 0.3 events with 6 b 
tags). One may also think about TTH production, with TT — > W + bW~b, also contributing 
to the final state studied. This process is irrelevant due to the small Yukawa coupling of the 
T quark. 

4 Higgs boson discovery 

We simulate events for an integrated luminosity of 30 fb -1 , which can be collected in three 
years at the low luminosity phase. For some background processes the number of events 
simulated corresponds to the cross section obtained from the generator scaled by a A; factor, 
to take into account higher order contributions with extra jets. (This k factor accounting for 
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Table 3: For each process: number of events simulated iVo and number of events passing the 
pre-selection criteria N. The first terms in the sums correspond to i = e, //, and the second 
ones to i = t. For some contributions (marked with an asterisk) we have simulated at least 
lOiVo events and rescaled the result to 30 fb , so as to reduce statistical fluctuations. 

higher multiplicity processes must not be confused with a K factor to take radiative corrections 
into account.) For the main background, tinj production, higher order processes are explicitly 
calculated, and k factors are not included except for N = 5, where we set k = 1.46 to account 
for tt + 6 jets. For tibb and ttcc, the k factor is estimated from the tinj cross sections as 
k = [a(ti2j) + ••• + a(tt6j)]/a(tt2j) = 2.05. For W/Z plus jets we use the approximate 
prescription in Ref. [18], which gives k = 2 — 3. For all signals we conservatively set k = 1. 
The reason for this will be explained later. The number of events generated for each process 
can be read in Table [3j In the sums, the first term corresponds to final states with I = e,fj, and 
the second one to £ = r, but in the following all lepton channels will be summed. A subtlety 
in the analysis is that when the singlet T is introduced the Htt, Wtb and Ztt couplings of 
the top quark are modified. This affects electroweak tibb and ticc production in a non-trivial 
way, and different samples (taking into account the corrections to the couplings) must be 
generated and simulated. tiH production is modified as well, with the Yukawa coupling of 
the top quark reduced by a factor X tt < 1- In our case, we have assumed a large mixing 
Vxb = 0.2, for which |Vtf,| = 0.98 and X tt = 0.96. These processes are indicated with a "(T)" 
in Table [3j where we can observe that the effect of mixing is negligible for tibb and ticc. A 
second issue to keep in mind is that when studying the new physics signals associated to the 
T quark we must distinguish the cases where the Higgs boson is present or not (if not, the 
branching ratios for T — > W + b and T — > Zt are larger). The latter are denoted with a "M". 

The discovery potential for the Higgs boson crucially depends on systematic errors. The 



10 



uncertainty in the background normalisation makes it difficult to detect the presence of a 
Higgs boson with a measurement of the total Ivbbbbjj cross section. Naively, from the data in 
Table [3] one could conclude that the statistical significance of the ttH signal, before applying 
any kinematical cut, is S/\/~B = 170.3/\/l3158.9 = 1.48<r. However, this estimate does 
not include the systematic uncertainty in the SM background total cross section (i.e. the 
background normalisation). A detailed calculation of systematic uncertainties is beyond the 
scope of this work. They generally arise from two sources: (i) the theoretical uncertainty 
in cross sections, due to higher loop contributions and uncertainty in parton distribution 
functions, among others; (ii) the systematic uncertainty related to the experimental detection 
(b tagging, jet energy scaling, etc.). The former can go up to 30-50% for tinj with large 
n, but they are reducible with more accurate theoretical calculations and/or background 
measurements (understanding to what extent they are reduced probably requires real data). 
For the latter we assume a "reference" value of 20%, close to the value ~ 26% obtained in 
Ref. [9] with a detailed analysis for the CMS detector. We replace the estimator So = S/VB 
by 

S 20 = S/^/B + (0.25)2 j (10) 

where S is the excess of events over the expected background. Incorporating systematic un- 
certainties in the previous example, we obtain a much smaller (but more realistic) significance 
<?2o = 0-064o\ We note that adding statistical and systematic uncertainties in quadrature is 
not the only way to incorporate systematics into the significance. Other possibilities exist, 
which are perhaps more correct from the statistical point of view, but we use this one for 
simplicity and in order to compare better with other studies. 

In the following we perform two different analyses of signals and backgrounds. The first 
one is a "standard" analysis aiming to discover the Higgs boson in ttH production, in which 
we reconstruct the final state to distinguish this signal from the SM background. In case that 
a new quark T exists, additional signal events will improve the Higgs discovery potential. 
The second analysis specifically looks for a Higgs boson produced in heavy quark decays, 
optimising the reconstruction for this signal. 

4.1 Analysis I: ttH reconstruction 

The reconstruction of the ttH signal is not done sequentially, but rather all possible pairings 
for light and b jets are tried, selecting the one which best resembles the kinematics of this 
process. We reconstruct the W boson decaying hadronically (called "hadronic" W boson) 
from a pair of untagged jets j\ and j%. For the leptonic W, the missing transverse momentum 
is assigned to the neutrino, and its longitudinal momentum and energy are found requiring 
that the invariant mass of the charged lepton and neutrino is the W mass, (pz +p u ) 2 = Myy- 
This equation gives two real solutions in most cases. In case there is no real solution (the 
discriminant of the quadratic equation is negative) we set it to zero to obtain a solution. 
This procedure gives reconstructed mass distributions almost indistinguishable from the ones 
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obtained using the collinear approximation, i.e. setting ]f v = p|. 

For each choice of ji, and leptonic W momentum, there are 12 possible assignments of 
the four b jets to the two W bosons, to form the two top quarks. (Around 9% of the signal 
events have five or more b jets, in which case we select the four with the highest transverse 
momentum.) Among all possibilities, we select the one minimising the quantity 



, ,..^ ad -m,) 2 (m l r-m t f (M^ d - M w f 



where m^ ad , rn l t p and M^ ad are the reconstructed masses of the hadronic and leptonic top 
quarks, and the hadronic W, respectively. St and S\y are fixed parameters corresponding 
to the widths of the reconstructed distributions, which are taken in this case to be equal, 
St = Sw = 10 GeV. For the best combination, the two remaining (unpaired) b jets are 
assumed to originate from the Higgs boson decay, whose momentum and invariant mass 
can then be reconstructed. Kinematical cuts are not applied at this level. The results are 
shown in Fig. [3j The reconstruction works very well for the tiH signal, with sharp peaks 
for the reconstructed masses My^ d , m^ ad and m' ep , and the Higgs mass distribution mainly 
concentrated around the true value M# = 115 GeV. Since the SM background is dominated 
by tinj production with two top quarks, the invariant masses of the hadronic W and the top 
pair are very well reconstructed too. For the TT Higgs signal this reconstruction method is 
not adequate, and the reconstructed Higgs mass spreads over a wider range. 

The signal significance can be improved by simply performing a kinematical cut on the 
Higgs reconstructed mass. Additionally, we perform a probabilistic analysis (see appendix [All, 
involving the following variables: 

• The light jet multiplicity A/jet- 

• The smallest invariant mass of a bb pair m^} [7], among those involving the four jets 
with largest transverse momentum. 

• The sum of the transverse momenta of the two top quarks, pj? ad +Pf 6p - 

• Angular quantities characterising the topology of the event: the azimuthal angle and 
rapidity difference (i) between the two b jets assigned to the Higgs, A</>&& and A77&&; (ii) 
between the Higgs and the closest (in AR) top quark, Acf>Ht and Arjnt', (hi) between 
the two top quarks, A(p t t and Ar]u- 

These variables, plotted in Figs. [HE] for the background and reference signal samples (with 
more statistics), are not suitable for kinematical cuts but help distinguish tiH production 
from the SM background. Additional variables can be considered, but we have found no 
improvement including them, and in some cases they reduce the discriminating power of the 
likelihood functions (for a discussion see the appendix). Using their distributions for tiH 
and the SM background we build signal and background likelihood functions Lg, Lb- The 
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Figure 3: Analysis I: Reconstructed masses of the hadronic W, the hadronic and leptonic top 
quarks and the Higgs boson. 
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Figure 4: Analysis I: Normalised light jet multiplicity A/jet and variable p]? ad + p' ep (see the 
text), used in the probabilistic analysis. The jet multiplicities of the two main TT Higgs 
signals are displayed separately for later convenience. The p^ ad + p' ep distribution for the 
TT Higgs signals is shown for illustration, but not included in the probabilistic analysis as a 
separate event class. 
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Figure 5: Analysis I: Normalised variables cosA</>&&, Ar]^, cosAcpm, Arjnt, cosA0 tt , Ar/u 
and (denned in the text), used in the probabilistic analysis. Log-likelihood function. 
The distributions for the TT Higgs signals are shown for illustration, but not included in the 
probabilistic analysis as a separate event class. 
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log-likelihood function log 10 Ls/Lb is also plotted in Fig. El In the absence of systematic 
errors, the highest statistical significance Sq = S/VB would be achieved with relatively loose 
cuts on Ls/Lb- But when one considers systematic uncertainties, the highest significance 
520 is found for more strict cuts, which reduce the background to few tens of events. For this 
purpose, we have found it very useful to employ a hybrid event selection method, in which 
we perform a simple cut on the Higgs reconstructed mass and include the rest of the relevant 
variables in the likelihood function. The kinematical cuts applied (not fine-tuned but close to 
the optimal values) are 

log 10 Ls/Lb > 0.75, 

100 GeV < Mf c < 140 GeV . (12) 

The number of events corresponding to each process can be read in Table [U We point out 
that the inclusion of the light jet multiplicity as a likelihood variable significantly reduces 
the tinj background for larger n. W/Z plus jets is essentially eliminated for high Ls values, 
even without requiring explicitly a good Myy d , mj iad and m' ep reconstruction. With these 
selection cuts a statistical significance 1S20 = 0.39cr is found for 30 fb _1 . This sensitivity is 
much lower than in previous ATLAS analyses [7, 8] but similar to the most recent one by 
CMS, 1S20 = 0.47(7. □ However, it must be noted that the CMS analysis uses full detector 
simulation, including the electron and muon efficiencies not taken here into account. On the 
other hand, the next-to-leading order cross section for tiH is used in that analysis, which is 
1.5 times larger than the one taken here. 
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Table 4: Analysis I: number of events A cut after the selection criteria in Eqs. (fl2l . 

We remark that the signal itself has additional higher order contributions tiHnj, with 
n > 1, which have not been included in the same way as tinj because the implementation of 
the matching prescription is not yet available (and also for consistency with the calculation 

For a better comparison between both results, this number has been obtained summing the number of 
events in the electron and muon channel in Ref. [9], rescaling them to 30 fb _1 and assuming a 20% background 
uncertainty 
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of TT, in which only the lowest order n = can be generated). When higher order processes 
are included, there are two alternatives for the likelihood analysis: (i) keep using the A/j e t 
distribution for tiH in Fig. HJ which suppresses tinj but also tiHnj for larger n; (ii) use a 
new A/jet distribution for tiHnj, which may improve the results. The first option can always 
be followed, and will of course lead to better results than the ones shown here (this is the 
reason why we have not included any k factors in the signals). Thus, the results shown here 
are conservative. From the number of tinj events in Tabled] we can estimate that the inclusion 
of higher tiHnj processes would double the sensitivity at least. These comments also apply to 
the case in which A/j e t is not included in the likelihood but a cut on this variable is performed 
(see the next section). 

The new Higgs signals from TT decays enhance the observability of the Higgs boson. 
Despite the very different kinematics of this process, and the fact that the reconstruction is 
aimed at identifying tiH production, TT events are more signal- than background-like, as it 
can be observed in Figs. |U-[5l Hence, they are not very suppressed by the kinematical cuts, 
and enhance the Higgs sensitivity by a factor of 6, 1S20 = 2.03<r. This improvement is sufficient 
to have hints of the Higgs boson with a luminosity of 30 fb _1 . However, one can do much 
better with a dedicated reconstruction aiming to detect the new quark. 

4.2 Analysis II: TT reconstruction 

The three different TT decay channels considered yield final states with four or six b quarks, 
and lead to signal events with four, five and six or more 6-tagged jets. (Due to mistags, the 
number of b jets may be occasionally larger than the number of b quarks at the partonic level.) 
The number of events corresponding to each decay channel and number of b jets are collected 
in Tabled! including also the SM background. 





Total 


4 tags 


5 tags 
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TT (WH) 


339.0 
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33.7 


2.1 


TT (HH) 


262.7 


166.0 


76.5 


20.2 


TT (ZH) 


130.5 


97.9 


27.3 


5.3 


Background 


13158.9 


12572.4 


561.1 


25.4 



Table 5: Analysis II: Number of events (for 30 fb x ) with four, five and six or more b tags, 
for each of the signal processes and the SM background. 

The discovery potential is higher if signal and background samples are separated accord- 
ing to their b jet multiplicity. This is also convenient from the point of view of the signal 
reconstruction. The two main signal channels, 

TT -» W + bHi/HtW~b -> W + bW~bH -> fiRbfrffi bb {WH) , 

TT -» Ht Hi -► W + bW'bHH -» hf[bfcfjb bb bb (HH) , (13) 
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have four and six b quarks in the final state, respectively, and different kinematics. Hence, for 
the reconstruction the events are classified as follows: 

• Events with four b tags are assigned to the WH mode and reconstructed accordingly. 

• Events with six or more b tags are assigned to the HH mode. 

• Events with five tags are assumed to belong to the HH mode as well if there are at 
least three non-6 jets (the sixth b jet is taken to be one of the non-tagged ones). A small 
fraction ~ 5% which only has two light jets is reconstructed as in the WH mode. 

This separation allows for a better reconstruction of TT (HH) events with five or more b tags, 
which amount to 36.6% of this channel and have a much smaller background. The remaining 
TT (HH) events only have four tags, and they are reconstructed as in the TT (WH) channel|j 
In both methods the heavy quark mass is not used in order to not bias the SM background 
towards this invariant mass value. The reconstruction is done by trying all possible pairings 
for light and b jets, and selecting the one which best resembles the kinematics of the decay 
channel considered. 

4.2.1 46 final states 

We reconstruct the hadronic W boson from a pair of light jets j\ and j2, and the leptonic W 
from the charged lepton and missing transverse momentum. With the W momenta determined 
up to a twofold ambiguity, we identify the two b quarks &t and bt coming from the decays 
T — > Wb, t — ► Wb. There are 24 possibilities for the pairing, because: (i) the heavy quark 
decaying to Wb (irrespectively of whether it is T or T) may have the W boson decaying 
hadronically or leptonically; (ii) the quark bx may correspond to each one of the four 6-tagged 
jets in the final state, and the three remaining ones are then produced in the cascade decay 
T — ► Ht — > bbWb; (iii) the quark bt from the top decay can be any of the latter three. Among 
the 48 resulting possibilities (plus different choices of j\ and j'2), we select the one minimising 
the quantity 

2 _ (m^-m^) 2 (mr-m t ) 2 (M^ - M w f 

^ m WH - ^2 1 o2 1 Q2 ' y Lq ) 

°T °t D W 

where m\ ec corresponds to the intermediate top quark (which may decay hadronically or 
leptonically), and m T ad , m^ p are the reconstructed masses of the hadronic and leptonic T 
quarks (independently of whether they decay to Wb or Ht). St, St and Syy are taken as 

5 We have also tried a reconstruction of the HH channel with only four b jets. This requires taking two light 
jets (among the many ones present in general) as if they were b jets, with a minimum of 2160 combinations (for 
a minimum of four light jets) for the reconstruction. For events with four b jets, we have thus tried a mixed 
procedure, selecting the channel which best fits the event kinematics. This improves the mass distributions for 
the TT (HH) signal but slightly degrades them for the TT (WH) channel and concentrates the background 
in the region of interest Mjf c = 100 — 140 GeV, giving worse results. 
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St = 100 GeV, St = 20 GeV and Sw = 10 GeV. No cuts are applied at this level. For the 
best pairing, the two remaining b jets not assigned to the T and t decays correspond to the 
Higgs boson. The reconstructed masses are shown in Fig. [6] for the sum of signal channels 
and the SM background. 

We build signal and background likelihood functions using: 

• The reconstructed masses mJ^ ad , m^ p . 

• Variables characterising the high transverse momentum of the signal: the total trans- 
verse energy Ht, the missing energy fa, the maximum and second maximum pt of the 

7 • , 6, max i b,max2 j .i j • r ii_ i • i_ i • . 7,max2 

b jets p t and p t , and the second maximum p t of the light jets p t 

• The energy of the charged lepton in the heavy quark rest frame, E\. This distribution 
has a long tail for TT (WH) signal events, not only because of the large T mass but 
also due to spin effects [46]. 

• The smallest invariant mass of a bb pair m^J and the second smallest one . 

• Angular quantities characterising the topology of the event: the azimuthal angle and 
rapidity difference (i) between the two b jets assigned to the Higgs, Afob and A77&&; (ii) 
between the Higgs and the reconstructed top quark, A<f>Ht and ArjHtl (iii) between the 
Higgs and its parent T quark, Atjht- 

The distributions of these variables are presented in Figs. [7j El We remark again that the 
selection of variables is not arbitrary, and some variables not considered, e.g. the transverse 
momentum of the charged lepton or the maximum transverse momentum of the light jets, 
have not been included because they actually reduce the discriminating power with respect to 
the set of variables above. This surprising fact is due to the correlation among variables, and 
is further explained in the appendix. We distinguish three likelihood classes: the TT (WH) 
and TT (HH) signals and the background. The signal likelihood is defined as the sum of the 
likelihoods of the two signal classes, L$ = Ls 1 + L$ 2 - The logarithm of Ls/Lb is plotted 
in Fig. El We observe that the TT (WH) distributions are in general more distinguishable 
from the background than the TT (HH) ones. This results in a cleaner separation between 
TT (WH) and the background. 

For event selection we again use a hybrid method, with cuts on reconstructed masses, jet 
multiplicity and signal likelihood. The selection criteria are 

logio L S /L B > 3.9, 
AA jet < 7, 

100 GeV < Mf c < 140 GeV , 

350 GeV < m^ ad , < 650 GeV . (15) 

The numbers of events after these cuts are collected in Table [6l The tinj background with 
larger n has larger transverse momenta and is less affected by the cut on likelihood, but it 
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Figure 6: Analysis II (46 final states): Reconstructed masses of the hadronic W, the top 
quark, the hadronic and leptonic heavy quarks and the Higgs boson, for the background and 
the sum of TT Higgs signals. 



is suppressed by the cut on jet multiplicity. W/Z plus jets is insignificant. We also note the 
smaller efficiency for the TT (HH) signal, expected since its likelihood function has a larger 
overlap with the background, see Fig. [8l Additionally, TT (HH) decays with four 6-tagged 
jets have a larger light jet multiplicity, and are more affected by the requirement A/j e t < 7. 
The same comments made in the preceding subsection regarding the cut on A/j e t and higher 
order signal processes apply here. 
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Before calculating the statistical significance of the Higgs signals from XT decays it is 
important to draw attention to the fact that, since neither the X quark nor the Higgs boson 
have been discovered at present, there are two possible definitions for what we consider as 
signal and background. The first one would be to take as background just the SM processes 
in Table [3] (excluding tiH), and for the signal tiH, TT (in all decay modes) and TTbb. The 
second possibility is to take as background the SM processes (slightly modified by the presence 
of the heavy quark) plus XX (WZ, ZZ) and TTbb in the absence of a Higgs boson (see Table 
[3]). Signal plus background is then constituted by the SM processes, plus TT (WZ, ZZ) and 
TTbb with a Higgs boson, and Higgs production processes tiH and XX (WH, HH, ZH). The 
"signal", that is, the excess of events over the background, is thus tiH plus XX (WH, HH, ZH) 
plus the difference between XX (WZ, ZZ) and TTbb with and without a Higgs boson, that is, 

B = SM bkg. + TT(WZ, ZZ; M) , 
S = tiH(T) + Tf(WH, HH, ZH) 

+ [TT(WZ, ZZ) - TT(WZ, ZZ; H)} + A SM bkg. (16) 

The term in brackets is always negative, and the difference in SM background is negligible. 
Both conventions lead to appreciably different results, and we adopt the latter, which is more 
conservative. (This amounts to considering that the X quark will have been discovered before 
the Higgs boson.) With this definition, the signal significance is £20 = 6.43<7, including a 20% 
systematic error. 
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Table 6: Analysis II (4b final states): Number of events (for 30 fb x ) after the kinematical 
cuts in Eq. (ITBl) . 



4.2.2 56 and 66 final states 

Reconstructing the decay XX -» HtHi -> HW + bHW'b requires identifying six 6 jets in 
the final state. In the case of five 6 tags, a light jet jb (if there are at least three) may be 
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assumed to come from a 6 quark as well. The hadronic W boson is reconstructed from a 
pair of untagged jets j\ and jY The leptonic W is reconstructed from the charged lepton 
momentum and missing energy. Each W boson is associated to three 6 jets to reconstruct 
the momenta of the T quarks (there are 20 combinations). For each choice, there are 3x3 
possibilities to associate two b jets to the hadronic and leptonic W, in order to reconstruct 
the two top quarks. The two remaining pairs of b jets (61,62), (63,64) are assumed to come 
from the decays of the two Higgs bosons, with reconstructed masses M|^ c = (associated 
to the hadronic top), Mjf^ = m^^ (associated to the leptonic one). Among the 360 resulting 
possibilities (plus different choices of j\, j'2 and j&), we select the one minimising the quantity 

2 (r4 ad -m£ p ) 2 (Mfc-Mfc) 2 (m^ d - m t ) 2 (mf p - m t ) 2 



"HH q2 ' q2 ' q2 ' o2 

°T °H °t °t 

+ (M^-M w f (17) 

We take St = 100 GeV, St = 20 GeV, Sw = Sh = 10 GeV. No cuts are applied at this level. 
The reconstructed masses are shown in Fig. [9] for the sum of the signal channels and the SM 
background. We define the reconstructed Higgs mass as the average of Mjf^ and M^. In 
this way, a sharper peak is obtained. 

In these final states the SM background is already very small, and performing kinematical 
cuts on reconstructed Higgs and heavy quark masses or light jet multiplicity can easily reduce 
the signal significance. Therefore, for this analysis we include these variables in the likelihood 
functions, and only perform loose cuts on the signal likelihood. The variables used are m^ ad , 
rn^P, M|p c , m£ b , Ht, Pt' max , Pt' max2 , and Fj' max2 , defined in the previous subsection, the jet 
multiplicity and the the charged lepton transverse momentum p^ v . We only use two classes, 
for the TT (HH) signal and the background, and the same distributions are used for final 
states with 5 and 6 6 quarks. The normalised variables are presented in Fig. [10] except Ht and 
^6,max w j 1 j c ] 1 are ver y s i m il ar to the plots in Fig. [7] and the jet multiplicity, shown in Fig. El 
The log-likelihood function is also presented in Fig.[l0l We point out that the signal likelihood 
for the TT (WH) and TT (ZH) processes is very high even without using a separate class for 
them. 

We suppress the background by requiring 

log 10 L s /L B >2.6 (56), 

log 10 L 5 /LB>0 (66). (18) 

The number of events after these cuts are collected in Table d For 30 fb _1 of luminosity, 
the statistical significance of the Higgs signal is 1S20 = 6.02cr, £20 = 5.63cr for 56 and 66 final 
states, respectively. We observe that the tibb background acquires increasing relevance in 
these final states with five and six 6-tagged jets. In order to have a good estimate of the 
effect of higher order processes tibbj, tibbjj, etc. we have included a factor k = 2.05 into 
its tree-level cross section, as explained in section [3l However, the kinematics of the higher 
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Figure 9: Analysis II (56, 6b final states): Reconstructed masses of the hadronic W, the 
hadronic and leptonic top and heavy quarks and the Higgs boson, for the background and the 
sum of TT Higgs signals. 



order processes might be important and a detailed simulation (when a Monte Carlo generator 
including a matching prescription for these processes is available) is needed to confirm these 
results. Besides, we have explicitly checked that the tibbbb background, not included in our 
simulations, is negligibly small. 
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Figure 10: Analysis II (56, 6b final states): Normalised variables m^ ad , m 
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Table 7: Analysis II (5b, 6b final states): Number of events (for 30 fb x ) after the selection 
cuts in Eqs. (fTSI) 



4.2.3 Summary 

For a luminosity of 30 fb -1 , the statistical significances of the three channels (including a 20% 
background systematic uncertainty) are 



46 : 


5*20 


= 6.43a 


5b : 


5*20 


= 6.02a 


6b : 


5*20 


= 5.63a 



When the three channels are combined, a statistical significance of 10.45cr is obtained for the 
Tf Higgs signals. This is a factor of 25 better than for ttH production, and offers a good 
opportunity to quickly discover the Higgs boson in final states containing a charged lepton and 
four or more b quarks. Rescaling the expected signal and background rates (and using Poisson 
statistics) it is found that a 5a discovery could be achieved approximately for 8 fb -1 . This 
represents a reduction in luminosity by more than one order of magnitude with respect to ttH 
production in all ti decay channels, and might be improved with less restrictive selection cuts. 
This high sensitivity is due not only to the large Tf cross section, but also to the distinctive 
features of this signal, characterised by large transverse momenta, high b jet multiplicity and 
reconstructed invariant masses peaking at uit- At any rate, a likelihood analysis must be 
employed to benefit from the distinctive kinematics and separate these signals from the ttnj 
background, which also involves large transverse momenta for higher values of n. 
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We finally comment on the experimental observation of the Higgs boson from TT decays. 
Although the reconstruction of the final state does not explicitly make use of the new quark 
mass, the distributions used in the probabilistic analysis do. Since the mass of an eventual 
heavy quark T is unknown, two alternatives are possible for the experimental search: (i) 
generate sets of distributions and build likelihood functions for different values of nix and 
compare them with real data; (ii) set generic kinematical cuts and look for peaks in the 
invariant mass distributions. The second approach gives sensitivities similar or worse than the 
ones obtained in this section, and the analysis has been omitted for brevity. For illustration, 
in the next section we will show how the new quark can be discovered with the observation 
of peaks in the m^ ad , m^ p distributions. 



5 Heavy quark discovery 

Discovering the Higgs boson from TT decays implies the discovery of the new quark. However, 
as emphasised in the paragraph before Eq. (fTB"j) . the significances for the Higgs and T quark 
discoveries are different, due to the different classification of signals and backgrounds. Using 
the data in Tables El [7] and taking ttH as part of the background, the significances for T 
discovery with 30 fb _1 are 



46 : 


5*20 


= 6.93a 


56 : 


5*20 


= 7.09a 


66 : 


5*20 


= 6.28a 



(20) 

with a combined significance S20 = H-74er. 5a evidence of the new quark (always assuming 
rriT = 500 GeV) could be achieved for 7 fb _1 . 

It is also interesting to discover the new quark by observing peaks in the m^ ad , m^F 
distributions. Quantifying the confidence level of such peaks, so as to claim discovery, requires 
an appropriate background normalisation. The procedure used here follows and extends the 
one proposed in Ref. [47] for detecting anomalous couplings. Performing a \ 2 fit to the binned 
data, a background rescaling factor k can be obtained by minimising the quantity 

^£^!, 

i 

where i sums over the bins, iV, are the numbers of events observed and Bi the expected 
background. The minimum is found for 

i 

where B = ^ Bi is the total expected background. Since in a real experiment the number 
of events observed will include not only the background but also a part from the signal itself, 
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in most cases k > 1 will be found. The uncertainty in this normalisation factor is given by 



5k 2 



B + B 2 ^B? 



1/2 



(23) 



For a single bin we have k = N/B, Sk/k = 1/VB, as expected. The statistical significance 
of the signal at the peak is 

S K = S'/yJuB + (5kB) 2 , (24) 

where S' < S is the excess of events over the rescaled background. The second term in the 
square root is a background normalisation systematic error, arising from the uncertainty in 
the determination of k. For a sufficiently large number of events, 6k ~ k/^/B is smaller than 
the assumed 20% systematic error in the total cross section. On the other hand, this approach 
has the drawback that the significance is determined by S' , which may be significantly smaller 
than S if off-peak signal contributions (combinatorial background) are large, and the "effective" 
statistical error in the background is \fk~B. Besides, this background rescaling assumes that 
the main sources of systematic error (e.g. b and light jet tagging efficiencies, jet energy 
resolution, etc.) do not significantly affect the shape of the relevant distribution in which the 
peak is observed. 

The probabilistic analysis in section 14.21 is not the best suited for detecting the peaks in 
the m^, ad , rn^P distributions. Even not including these variables in the likelihood functions, 
requiring a high signal likelihood biases the background, concentrating the distributions of 
m^ ad and m^ p around mj- = 500 GeV. This is not completely unexpected, since the signal 
distributions of the total transverse energy, missing momentum, etc. have been obtained 
assuming mj- = 500 GeV. Therefore, instead of a likelihood analysis we perform one based on 
simple kinematical cuts. We restrict ourselves to final states with 4 6-tagged jets (in the 5b and 
6b channels the background rescaling has a larger uncertainty due to the smaller statistics). 
We require 

H T > 1000 GeV , 

p 6,max > 100 GeV, 

AA jet < 7 (25) 

to reduce the background. The reconstructed mass distributions obtained are presented in 
Fig. HH The SM background is normalised with cross section measurements in the regions 
160 GeV < m^ ad ,m^ p < 360 GeV, 680 GeV < m^ ad ,m£ p < 840 GeV, obtaining similar 
rescaling factors in both distributions, n = 1.139 ± 0.051 and k = 1.141 ± 0.050 respectively. 
Within the mass windows 

360 GeV < m^ ad , m^ p < 640 GeV (26) 

the significance of the signal (over the rescaled background) is S K = 4.27cr. (The total number 
of events after the cuts in Eqs. (|25"T) and the events in the peak regions can be read in Table H) 
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In this example we find a smaller sensitivity with this method than with the probabilistic 
analysis used in section 14.21 which was 1S20 = 6.93c Nevertheless, it has the aesthetical 
advantage of being able to observe the peaks corresponding to the new quark with unbiased 
background. 
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Figure 11: Reconstructed heavy quark masses after the kinematical cuts in Eqs. ([25]) . The 
dotted lines represent the SM background, and the red lines the same but rescaled by factor 
k — 1.14. The continuous lines correspond to the background plus all heavy quark signals. 
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Table 8: Number of events with 4 b tags (for 30 fb after the selection cuts in Eqs. (|25l) 
(-^cut) and also within the mass windows in Eq. (|26|) (-/V pea k). 
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6 Other results 



We conclude this analysis examining the dependence of our results on some of our assumptions. 
We can estimate how our results change if: (i) we use MRST structure functions [48]; (ii) we 
include the charged lepton identification efficiency; (iii) we select 6 tagging efficiencies of 50% 
or 70%; (iv) a systematic uncertainty of 30% is assumed in the background. In the first case we 
compute the significances rescaling the numbers of events in Tables 0H7] by factors reflecting 
the change in the cross sections. In the second case we naively use an average charged lepton 
identification efficiency of 90%. For the third, we provide crude estimates based on rescaling 
by the nominal 6 tagging efficiencies and rejection factors. The resulting significances for the 
Higgs signals are collected in Table [9l For the T discovery in the 46, 5b and 6b channels they 
are slightly larger, as shown in the previous section. 





tiH 


TT(H, 46) 


Tf(H, 56) 


Tf(H, 


Standard 


0.39 


6.43 


6.02 


5.63 


MRST 


0.38 


7.30 
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6.45 


I eff. 90% 
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6.24 
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b eff. 50% 
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6.28 
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4.80 


b eff. 70% 


0.12 


1.74 


1.70 


1.81 


sys 30% 


0.31 


5.08 


4.72 


4.78 



Table 9: Estimates of the Higgs signal significances under different assumptions, explained in 
the text. 

The results are rather stable except for a 70% 6 tagging efficiency, where backgrounds 
grow due to the larger mistagging rate. For a slightly different Higgs mass the results are 
stable too, as long as the decay H — > bb dominates, and an additional (small) dependence on 
Mff is through the branching ratios for T decays, plotted in Fig. [2l For larger T masses the 
signal is suppressed (and for lighter T enhanced) as a consequence of the variation in the TT 
cross section, plotted in Fig. [U For instance, for a heavy quark mass mj- = 600 GeV the cross 
section is 776 fb, almost three times smaller than for my = 500 GeV. On the other hand, the 
SM background decreases for larger transverse momenta, but the latter effect does not make 
up for the reduction in the TT cross section. 

7 Summary 

Heavy singlet decays were recognised early as an important source of Higgs bosons [14], with a 
branching ratio close to 25% for Mh <C rriT- In this work we have addressed their experimental 
observation at LHC, assuming a Higgs mass of 115 GeV and the possible existence of a 
500 GeV heavy quark T. We have performed a detailed signal and background study, with 
matrix-element-based generators for the hard processes, subsequent parton showering and 
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hadronisation by PYTHIA and a fast simulation of the ATLAS detector. As a by-product, 
new leading-order event generators for tibb, ticc, tiH, Wbbbb and other processes have been 
developed. These generators include top quark, W and Higgs boson decays and take finite 
width and spin effects into account. Their output provides the colour information necessary 
for hadronisation. 

In our analysis we have first reevaluated the discovery potential of tiH production, with 
H — > bb and semileptonic decay of the tt pair, in the SM. Our result, 0.4a significance for 
30 fb _1 in low luminosity running, is similar to the most recent one by CMS, although the 
details of the analysis (full simulation for the CMS analysis, with inclusion of a K factor 
for tiH) differ. Both results are substantially more pessimistic than earlier ones [5,7,8], be- 
cause in previous studies only the lowest orders of the leading tinj background were taken 
into account, and systematic uncertainties in the background normalisation were not consid- 
ered. The b tagging performance has a large impact on the final result, especially regarding 
the dominant tinj background. We have used the efficiencies implemented in ATLFASTB for 
the low luminosity run: 60% b tagging rate and nominal rejection factors of 6.7 for charm 
and 93 for light jets (with pt-dependent corrections). If the latter are better than expected, 
the observability of tiH production will improve. In this respect, full simulations of matrix- 
element-generated signals and backgrounds would be welcome, but it is not likely that results 
will attain observability of tiH. Results also depend to some extent on the ability to recon- 
struct invariant masses. With a full simulation the mass reconstruction may be degraded, 
although studies performed for top pair production have shown good agreement between fast 
and full simulations, not only for reconstructed masses but also for angular distributions [42]. 
On the other hand, it must be pointed out that our results are conservative in the sense that 
higher multiplicity backgrounds tinj are included but not higher multiplicity signal processes 
tiHnj. The latter might improve the observability by a factor of two. 

New Higgs signals from TT decays, TT -► W + bHi/HtW~b, TT -► Ht Hi and TT -► 
Zt Hi I Ht Zi, have been then examined. We have demonstrated that, in a standard search for 
tiH production, a possible contribution of these processes can easily be overlooked, and do not 
much improve the Higgs observability We have presented a novel reconstruction technique 
specific to the search for the leading signals TT -► W + bHi/HtW-~b -► W+bW'bH, TT -► 
Ht Hi — > W + bW~bHH , which does not require knowledge of the heavy quark mass. Despite 
their different kinematics and large transverse momenta, these signals are not easy to isolate 
from the tinj background, which is large and also involves larger transverse momenta for 
increasing values of n. Using a likelihood analysis, these processes are cleanly separated from 
the SM background, giving a high statistical significance for the Higgs, 10.4a for 30 fb _1 
including a 20% systematic uncertainty in the background normalisation. In the case that a 
500 GeV T quark exists, 8 fb _1 of luminosity could suffice to discover the Higgs boson. This 
striking signal is due to the large TT production cross section (2.14 pb for mr = 500 GeV), 
the large branching ratio for final states with Higgs bosons, Br(TT — ► H + X) = 0.55, and 
the distinctive features of these processes: in addition to larger transverse momenta, a high b 
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jet multiplicity in the final state and reconstructed invariant masses peaking at my. 

Finally, we have addressed the observability of the new quark, which is not equivalent 
to the discovery of the Higgs boson because the classification of processes as signals and 
background differs. We have shown that a significance of 11.7(7 is reached for 30 fb _1 , similar 
to the one in the TT — ► W + bW~b channel (a detailed comparison between both channels is 
difficult because of the different assumptions made in the two studies). We have also used a 
standard analysis in order to show that the peaks in the invariant mass distributions of the 
heavy quarks would be easy to observe, even considering the uncertainties in the background 
normalisation. For higher T masses, TT — ► W + bW~b is the leading discovery channel, due 
to three facts: (i) the branching ratio for iubbbbjj final states decreases slightly with tut; 
(ii) for heavier T, the charged lepton from the semileptonic decay T — ► W + b — > £ + vb (or the 
charge conjugate) generically has a very large transverse momentum which can be exploited 
to reduce backgrounds very efficiently [46]; (iii) larger T masses can only be explored in a high 
luminosity LHC run, where b tagging performance is degraded and multi-jet backgrounds to 
the TT(WH, HH, ZH) signals are larger. 
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A Probabilistic analysis 

In the probabilistic analysis we build likelihood functions which use information from several 
kinematical variables to discriminate between event classes, namely the signal (one or more) 
and the background. For a given kinematical variable x, e.g. a transverse momentum, different 
event classes j = 1, . . . ,m have different kinematical distributions P(x), which we normalise 
to unity. We define the "probability" function 

* w = 5jife- (27) 

If the distributions f J are normalised to their total cross section, the function p>(x) represents 
the probability that the event corresponds to the class j, and when normalised to unity p> it 
is the relative probability (up to total cross section factors). For a set of kinematical variables 
Xi, i = 1, . . . , re, the likelihoods Lj are then defined as the product of the probabilities for 
each variable Xi, 

n 

Lj(xi,...,x n ) = Y[p l j{xi) j = l,...,m. (28) 
i=i 
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Selection cuts may be applied on likelihood ratios LsJLb, for Si and B the signal and 
background classes, respectively, in order to enhance the signal(s). Alternatively, instead of 
working directly with these ratios it is often more practical to consider the logarithm of these 
quantities, log 10 L S JL B . 






Figure 12: Normalised variables p° v , pi' max f or ^ e analysis II (46 final states), without cuts 
and after requiring log 10 L$/Lb > 2, with Lg, Lb involving the rest of variables. 



We emphasise that performing a probabilistic analysis of this type is not as straightforward 
as one might think. Naively, one would take all the relevant variables which exhibit different 
distributions for signal and background and build with them likelihood functions. But this is 
not optimal and, perhaps surprisingly, some variables which one might consider as relevant 
actually reduce the discriminating power of the likelihood functions. This can be understood 
as a result of the fact that some variables are correlated, and selecting values of one of them 
modifies the distribution of the others. Let us take as example the transverse momentum 
distributions of the charged lepton (p' ep ) and the light jet with maximum p t (pj' max ) for the 
analysis II in 46 final states. These variables have not been included in the probabilistic 
analysis in this case. Their normalised distributions before any cut are presented in Fig. [12] 
(left), and after requiring log 10 Lg/ Lb > 2 (but without them on the likelihood functions) 
on the right. From their distributions in the left column we observe that their inclusion 
in the likelihood functions would favour larger transverse momenta, since the background 
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distributions are peaked at lower p t . But observing the right column we realise that this 
would actually disfavour the signal over the background (for example, the tail in the p^' max 
distribution after the likelihood cut is larger for the background than for the two signals, and 
in the p l ^ p distribution larger for the background than for TT (HH)). These examples make 
apparent that optimising the analysis requires educated guessing and trial and error to find 
(or get close to) the best set of variables. 
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